Audio to Score Matching by Combining Phonetic and Duration Information

نویسندگان

Rong Gong

Jordi Pons

Xavier Serra

چکیده

We approach the singing phrase audio to score matching problem by using phonetic and duration information – with a focus on studying the jingju a cappella singing case. We argue that, due to the existence of a basic melodic contour for each mode in jingju music, only using melodic information (such as pitch contour) will result in an ambiguous matching. This leads us to propose a matching approach based on the use of phonetic and duration information. Phonetic information is extracted with an acoustic model shaped with our data, and duration information is considered with the Hidden Markov Models (HMMs) variants we investigate. We build a model for each lyric path in our scores and we achieve the matching by ranking the posterior probabilities of the decoded most likely state sequences. Three acoustic models are investigated: (i) convolutional neural networks (CNNs), (ii) deep neural networks (DNNs) and (iii) Gaussian mixture models (GMMs). Also, two duration models are compared: (i) hidden semi-Markov model (HSMM) and (ii) post-processor duration model. Results show that CNNs perform better in our (small) audio dataset and also that HSMM outperforms the post-processor duration model.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimizing image steganography by combining the GA and ICA

In this study, a novel approach which uses combination of steganography and cryptography for hiding information into digital images as host media is proposed. In the process, secret data is first encrypted using the mono-alphabetic substitution cipher method and then the encrypted secret data is embedded inside an image using an algorithm which combines the random patterns based on Space Fillin...

متن کامل

Consequences of Traffic Accidents Transferred by Helicopter and Ground Ambulance: Propensity Score Matching

Introduction: The main task of the emergency medical system is to provide primary care and transfer the injured to a medical center. Studies have been conducted to investigate the outcome of helicopter and ground ambulance casualties, but they show different results. These different results may be due to the type of study, statistical methods, differences in pre-hospital emergency systems, an...

متن کامل

Using Natural Language Input and Audio Analysis for a Human-Oriented MIR System

In this paper we will present a MIR (Music Information Retrieval) system using natural language as input for human-oriented queries to large-scale music collections, applicable in web databases, cd-chargers for cars, or mobile services. The outlined system is a full-fledged architecture combining state-of-the-art approaches from the fields of natural language understanding including phonetic ma...

متن کامل

Production of English Lexical Stress by Persian EFL Learners

This study examines the phonetic properties of lexical stress in English produced by Persian speakers learning English as a foreign language. The four most reliable phonetic correlates of English lexical stress, namely fundamental frequency, duration, intensity, and vowel quality were measured across Persian speakers’ production of the stressed and unstressed syllables of five English disyllabi...

متن کامل

A Lyrics-matching Qbh System for Inter- Active Environments

Query-by-Humming (QBH) is an increasingly prominent technology that allows users to browse through a song database by singing/humming a part of the song they wish to retrieve. Besides these cases, QBH can also be used to track the performance of a user in applications such as Score Alignment and Real-Time Accompaniment. In this paper we present an online QBH algorithm for audio recordings of si...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Audio to Score Matching by Combining Phonetic and Duration Information

نویسندگان

چکیده

منابع مشابه

Optimizing image steganography by combining the GA and ICA

Consequences of Traffic Accidents Transferred by Helicopter and Ground Ambulance: Propensity Score Matching

Using Natural Language Input and Audio Analysis for a Human-Oriented MIR System

Production of English Lexical Stress by Persian EFL Learners

A Lyrics-matching Qbh System for Inter- Active Environments

عنوان ژورنال:

اشتراک گذاری